Generative Adversarial Network
   HOME

TheInfoList



OR:

A generative adversarial network (GAN) is a class of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
frameworks designed by
Ian Goodfellow Ian J. Goodfellow (born ) is a computer scientist, engineer, and executive, most noted for his work on artificial neural networks and deep learning. He was previously employed as a research scientist at Google Brain and director of machine lea ...
and his colleagues in June 2014. Two
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s contest with each other in the form of a
zero-sum game Zero-sum game is a mathematical representation in game theory and economic theory of a situation which involves two sides, where the result is an advantage for one side and an equivalent loss for the other. In other words, player one's gain is e ...
, where one agent's gain is another agent's loss. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of
generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsis ...
for
unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
, GANs have also proved useful for
semi-supervised learning Weak supervision is a branch of machine learning where noisy, limited, or imprecise sources are used to provide supervision signal for labeling large amounts of training data in a supervised learning setting. This approach alleviates the burden of ...
, fully
supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
, and
reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
. The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically. This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. GANs are similar to
mimicry In evolutionary biology, mimicry is an evolved resemblance between an organism and another object, often an organism of another species. Mimicry may evolve between different species, or between individuals of the same species. Often, mimicry f ...
in
evolutionary biology Evolutionary biology is the subfield of biology that studies the evolutionary processes (natural selection, common descent, speciation) that produced the diversity of life on Earth. It is also defined as the study of the history of life fo ...
, with an
evolutionary arms race In evolutionary biology, an evolutionary arms race is an ongoing struggle between competing sets of co-evolving genes, phenotypic and behavioral traits that develop escalating adaptations and counter-adaptations against each other, resembling an a ...
between both networks.


Definition


Mathematical

The original GAN is defined as the following
game A game is a structured form of play (activity), play, usually undertaken for enjoyment, entertainment or fun, and sometimes used as an educational tool. Many games are also considered to be work (such as professional players of spectator s ...
:
Each
probability space In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models t ...
(\Omega, \mu_) defines a GAN game. There are 2 players: generator and discriminator. The generator's strategy set is \mathcal P(\Omega), the set of all probability measures \mu_G on \Omega. The discriminator's strategy set is the set of
Markov kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite ...
s \mu_D: \Omega \to \mathcal P
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>, where \mathcal P
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math> is the set of probability measures on
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>. The GAN game is a
zero-sum game Zero-sum game is a mathematical representation in game theory and economic theory of a situation which involves two sides, where the result is an advantage for one side and an equivalent loss for the other. In other words, player one's gain is e ...
, with objective functionL(\mu_G, \mu_D) := \mathbb_ ln y+ \mathbb_ ln (1-y) The generator aims to minimize the objective, and the discriminator aims to maximize the objective.
The generator's task is to approach \mu_G \approx \mu_, that is, to match its own output distribution as closely as possible to the reference distribution. The discriminator's task is to output a value close to 1 when the input appears to be from the reference distribution, and to output a value close to 0 when the input looks like it came from the generator distribution.


In practice

The ''generative'' network generates candidates while the ''discriminative'' network evaluates them. The contest operates in terms of data distributions. Typically, the generative network learns to map from a
latent space A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another in the latent space. Position within the latent s ...
to a data distribution of interest, while the discriminative network distinguishes candidates produced by the generator from the true data distribution. The generative network's training objective is to increase the error rate of the discriminative network (i.e., "fool" the discriminator network by producing novel candidates that the discriminator thinks are not synthesized (are part of the true data distribution)). A known dataset serves as the initial training data for the discriminator. Training involves presenting it with samples from the training dataset until it achieves acceptable accuracy. The generator is trained based on whether it succeeds in fooling the discriminator. Typically, the generator is seeded with randomized input that is sampled from a predefined
latent space A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another in the latent space. Position within the latent s ...
(e.g. a
multivariate normal distribution In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One d ...
). Thereafter, candidates synthesized by the generator are evaluated by the discriminator. Independent
backpropagation In machine learning, backpropagation (backprop, BP) is a widely used algorithm for training feedforward neural network, feedforward artificial neural networks. Generalizations of backpropagation exist for other artificial neural networks (ANN ...
procedures are applied to both networks so that the generator produces better samples, while the discriminator becomes more skilled at flagging synthetic samples. When used for image generation, the generator is typically a deconvolutional neural network, and the discriminator is a
convolutional neural network In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
.


Relation to other statistical machine learning methods

GANs are implicit generative models, which means that they do not explicitly model the likelihood function nor provide a means for finding the latent variable corresponding to a given sample, unlike alternatives such as
flow-based generative model A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the change-of-variable law of probabilities to tra ...
. Compared to fully visible belief networks such as
WaveNet WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based AI firm DeepMind. The technique, outlined in a paper in September 2016, is able to generate relatively realistic-sounding human-like voices ...
and PixelRNN and autoregressive models in general, GANs can generate one complete sample in one pass, rather than multiple passes through the network. Compared to
Boltzmann machine A Boltzmann machine (also called Sherrington–Kirkpatrick model with external field or stochastic Ising–Lenz–Little model) is a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model, that is a stochastic ...
s and nonlinear ICA, there is no restriction on the type of function used by the network. Since neural networks are universal approximators, GANs are asymptotically consistent.
Variational autoencoders In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods. ...
might be universal approximators, but it is not proven as of 2017.


Mathematical properties


Measure-theoretic considerations

This section provides some of the mathematical theory behind these methods. In modern probability theory based on
measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures ( length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simil ...
, a probability space also needs to be equipped with a
σ-algebra In mathematical analysis and in probability theory, a σ-algebra (also σ-field) on a set ''X'' is a collection Σ of subsets of ''X'' that includes the empty subset, is closed under complement, and is closed under countable unions and countabl ...
. As a result, a more rigorous definition of the GAN game would make the following changes:
Each probability space (\Omega, \mathcal B, \mu_) defines a GAN game. The generator's strategy set is \mathcal P(\Omega, \mathcal B), the set of all probability measures \mu_G on the measure-space (\Omega, \mathcal B). The discriminator's strategy set is the set of
Markov kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite ...
s \mu_D: (\Omega, \mathcal B) \to \mathcal P(
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
\mathcal B(
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
), where \mathcal B(
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
is the
Borel σ-algebra In mathematics, a Borel set is any set in a topological space that can be formed from open sets (or, equivalently, from closed sets) through the operations of countable union, countable intersection, and relative complement. Borel sets are nam ...
on
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>.
Since issues of measurability never arise in practice, these will not concern us further.


Choice of the strategy set

In the most generic version of the GAN game described above, the strategy set for the discriminator contains all Markov kernels \mu_D: \Omega \to ,1/math>, and the strategy set for the generator contains arbitrary probability distributions \mu_G on \Omega. However, as shown below, the optimal discriminator strategy against any \mu_G is deterministic, so there is no loss of generality in restricting the discriminator's strategies to deterministic functions D:\Omega \to
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>. In most applications, D is a
deep neural network Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. D ...
function. As for the generator, while \mu_G could theoretically be any computable probability distribution, in practice, it is usually implemented as a
pushforward The notion of pushforward in mathematics is "dual" to the notion of pullback, and can mean a number of different but closely related things. * Pushforward (differential), the differential of a smooth map between manifolds, and the "pushforward" op ...
: \mu_G = \mu_Z \circ G^. That is, start with a random variable z \sim \mu_Z, where \mu_Z is a probability distribution that is easy to compute (such as the uniform distribution, or the Gaussian distribution), then define a function G: \Omega_Z \to \Omega. Then the distribution \mu_G is the distribution of G(z). Consequently, the generator's strategy is usually defined as just G, leaving z \sim \mu_Z implicit. In this formalism, the GAN game objective isL(G, D) := \mathbb_ ln D(x)+ \mathbb_ ln (1-D(G(z))


Generative reparametrization

The GAN architecture has two main components. One is casting optimization into a game, of form \min_G \max_D L(G, D), which is different from the usual kind of optimization, of form \min_\theta L(\theta). The other is the decomposition of \mu_G into \mu_Z \circ G^, which can be understood as a reparametrization trick. To see its significance, one must compare GAN with previous methods for learning generative models, which were plagued with "intractable probabilistic computations that arise in maximum likelihood estimation and related strategies". At the same time, Kingma and Welling and Rezende et al. developed the same idea of reparametrization into a general stochastic backpropagation method. Among its first applications was the
variational autoencoder In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods. ...
.


Move order and strategic equilibria

In the original paper, as well as most subsequent papers, it is usually assumed that the generator ''moves first'', and the discriminator ''moves second'', thus giving the following minimax game:\min_\max_ L(\mu_G, \mu_D) := \mathbb_ ln y+ \mathbb_ ln (1-y) If both the generator's and the discriminator's strategy sets are spanned by a finite number of strategies, then by the
minimax theorem In the mathematical area of game theory, a minimax theorem is a theorem providing conditions that guarantee that the max–min inequality is also an equality. The first theorem in this sense is von Neumann's minimax theorem from 1928, which was c ...
,\min_\max_ L(\mu_G, \mu_D)= \max_\min_ L(\mu_G, \mu_D)that is, the move order does not matter. However, since the strategy sets are both not finitely spanned, the minimax theorem does not apply, and the idea of an "equilibrium" becomes delicate. To wit, there are the following different concepts of equilibrium: * Equilibrium when generator moves first, and discriminator moves second:\hat \mu_G \in \arg\min_\max_ L(\mu_G,\mu_D),\quad \hat \mu_D \in \arg\max_ L(\hat\mu_G, \mu_D), \quad * Equilibrium when discriminator moves first, and generator moves second:\hat \mu_D \in \arg\max_\min_ L(\mu_G, \mu_D), \quad \hat \mu_G \in \arg\min_ L(\mu_G,\hat \mu_D), *
Nash equilibrium In game theory, the Nash equilibrium, named after the mathematician John Nash, is the most common way to define the solution of a non-cooperative game involving two or more players. In a Nash equilibrium, each player is assumed to know the equili ...
(\hat \mu_D, \hat\mu_G) , which is stable under simultaneous move order:\hat \mu_D \in \arg\max_ L(\hat\mu_G, \mu_D), \quad \hat \mu_G \in \arg\min_ L(\mu_G, \hat\mu_D) For general games, these equilibria do not have to agree, or even to exist. For the original GAN game, these equilibria all exist, and are all equal. However, for more general GAN games, these do not necessarily exist, or agree.


Main theorems for GAN game

The original GAN paper proved the following two theorems: Interpretation: For any fixed generator strategy \mu_G, the optimal discriminator keeps track of the likelihood ratio between the reference distribution and the generator distribution:\frac = \frac(x) = \frac; \quad D(x) = \sigma(\ln\mu_(dx) - \ln\mu_(dx))where \sigma is the
logistic function A logistic function or logistic curve is a common S-shaped curve (sigmoid curve) with equation f(x) = \frac, where For values of x in the domain of real numbers from -\infty to +\infty, the S-curve shown on the right is obtained, with the ...
. In particular, if the prior probability for an image x to come from the reference distribution is equal to \frac 12, then D(x) is just the posterior probability that x came from the reference distribution:D(x) = Pr(x \text , x).


Training and evaluating GAN


Training


Unstable convergence

While the GAN game has a unique global equilibrium point when both the generator and discriminator have access to their entire strategy sets, the equilibrium is no longer guaranteed when they have a restricted strategy set. In practice, the generator has access only to measures of form \mu_Z \circ G_\theta^, where G_\theta is a function computed by a neural network with parameters \theta, and \mu_Z is an easily sampled distribution, such as the uniform or normal distribution. Similarly, the discriminator has access only to functions of form D_\zeta, a function computed by a neural network with parameters \zeta. These restricted strategy sets take up a ''vanishingly small proportion'' of their entire strategy sets. Further, even if an equilibrium still exists, it can only be found by searching in the high-dimensional space of all possible neural network functions. The standard strategy of using
gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
to find the equilibrium often does not work for GAN, and often the game "collapses" into one of several failure modes.


Mode collapse

GANs often suffer from mode collapse where they fail to generalize properly, missing entire modes from the input data. For example, a GAN trained on the MNIST dataset containing many samples of each digit might only generate pictures of digit 0. This was named in the first paper as the " Helvetica scenario". One way this can happen is if the generator learns too fast compared to the discriminator. If the discriminator D is held constant, then the optimal generator would only output elements of \arg\max_x D(x). So for example, if during GAN training for generating MNIST dataset, for a few epochs, the discriminator somehow prefers the digit 0 slightly more than other digits, the generator may seize the opportunity to generate only digit 0, then be unable to escape the local minimum after the discriminator improves. Some researchers perceive the root problem to be a weak discriminative network that fails to notice the pattern of omission, while others assign blame to a bad choice of
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cos ...
. Many solutions have been proposed, but it is still an open problem. Even the state-of-the-art architecture, BigGAN (2019), could not avoid mode collapse. The authors resorted to "allowing collapse to occur at the later stages of training, by which time a model is sufficiently trained to achieve good results".


Two time-scale update rule

The two time-scale update rule (TTUR) is proposed to make GAN convergence more stable by making the learning rate of the generator lower than that of the discriminator. The authors argued that the generator should move slower than the discriminator, so that it does not "drive the discriminator steadily into new regions without capturing its gathered information". They proved that a general class of games that included the GAN game, when trained under TTUR, "converges under mild assumptions to a stationary local Nash equilibrium". They also proposed using the Adam stochastic optimization to avoid mode collapse, as well as the
Fréchet inception distance The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN). Unlike the earlier inception score (IS), which evaluates only the distribution of gen ...
for evaluating GAN performances.


Vanishing gradient

Conversely, if the discriminator learns too fast compared to the generator, then the discriminator could almost perfectly distinguish \mu_, \mu_. In such case, the generator G_\theta could be stuck with a very high loss no matter which direction it changes its \theta, meaning that the gradient \nabla_\theta L(G_\theta, D_\zeta) would be close to zero. In such case, the generator cannot learn, a case of the vanishing gradient problem. Intuitively speaking, the discriminator is too good, and since the generator cannot take any small step (only small steps are considered in gradient descent) to improve its payoff, it does not even try. One important method for solving this problem is the
Wasserstein GAN The Wasserstein Generative Adversarial Network (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to "improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning c ...
.


Evaluation

GANs are usually evaluated by
Inception score The Inception Score (IS) is an algorithm used to assess the quality of images created by a generative image model such as a generative adversarial network (GAN). The score is calculated based on the output of a separate, pretrained Inceptionv3 image ...
(IS), which measures how varied the generator's outputs are (as classified by a image classifier, usually Inception-v3), or
Fréchet inception distance The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN). Unlike the earlier inception score (IS), which evaluates only the distribution of gen ...
(FID), which measures how similar the generator's outputs are to a reference set (as classified by a learned image featurizer, such as Inception-v3 without its final layer). Many papers that propose new GAN architectures for image generation report how their architectures break the
state of the art The state of the art (sometimes cutting edge or leading edge) refers to the highest level of general development, as of a device, technique, or scientific field achieved at a particular time. However, in some contexts it can also refer to a level ...
on FID or IS. Another evaluation method is the Learned Perceptual Image Patch Similarity (LPIPS), which starts with a learned image featurizer f_\theta: \text \to \R^n, and finetunes it by supervised learning on a set of (x, x', \text(x, x')), where x is an image, x' is a perturbed version of it, and \text(x, x') is how much they differ, as reported by human subjects. The model is finetuned so that it can approximate \, f_\theta(x) - f_\theta(x')\, \approx \text(x, x'). This finetuned model is then used to define \text(x, x') := \, f_\theta(x) - f_\theta(x')\, . Other evaluation methods are reviewed in.


Variants

There is a veritable zoo of GAN variants. Some of the most prominent are as follows:


Conditional GAN

Conditional GANs are similar to standard GANs except they allow the model to conditionally generate samples based on additional information. For example, if we want to generate a cat face given a dog picture, we could use a conditional GAN The generator in a GAN game generates \mu_G, a probability distribution on the probability space \Omega. This leads to the idea of a conditional GAN, where instead of generating one probability distribution on \Omega, the generator generates a different probability distribution \mu_G(c) on \Omega, for each given class label c. For example, for generating images that look like ImageNet, the generator should be able to generate a picture of cat when given the class label "cat". In the original paper, the authors noted that GAN can be trivially extended to conditional GAN by providing the labels to both the generator and the discriminator. Concretely, the conditional GAN game is just the GAN game with class labels provided:L(\mu_G, D) := \mathbb_ ln D(x, c)+ \mathbb_ ln (1-D(x, c))/math>where \mu_C is a probability distribution over classes, \mu_(c) is the probability distribution of real images of class c, and \mu_G(c) the probability distribution of images generated by the generator when given class label c. In 2017, a conditional GAN learned to generate 1000 image classes of ImageNet.


GANs with alternative architectures

The GAN game is a general framework and can be run with any reasonable parametrization of the generator G and discriminator D. In the original paper, the authors demonstrated it using
multilayer perceptron A multilayer perceptron (MLP) is a fully connected class of feedforward artificial neural network (ANN). The term MLP is used ambiguously, sometimes loosely to mean ''any'' feedforward ANN, sometimes strictly to refer to networks composed of mul ...
networks and
convolutional neural network In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
s. Many alternative architectures have been tried. Deep convolutional GAN (DCGAN): For both generator and discriminator, uses only deep networks consisting entirely of convolution-deconvolution layers, that is, fully convolutional networks. Self-attention GAN (SAGAN): Starts with the DCGAN, then adds residually-connected standard self-attention modules to the generator and discriminator. Variational autoencoder GAN (VAEGAN): Uses a
variational autoencoder In machine learning, a variational autoencoder (VAE), is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling, belonging to the families of probabilistic graphical models and variational Bayesian methods. ...
(VAE) for the generator. Transformer GAN (TransGAN): Uses the pure
transformer A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer' ...
architecture for both the generator and discriminator, entirely devoid of convolution-deconvolution layers. Flow-GAN: Uses
flow-based generative model A flow-based generative model is a generative model used in machine learning that explicitly models a probability distribution by leveraging normalizing flow, which is a statistical method using the change-of-variable law of probabilities to tra ...
for the generator, allowing efficient computation of the likelihood function.


GANs with alternative objectives

Many GAN variants are merely obtained by changing the loss functions for the generator and discriminator. Original GAN: We recast the original GAN objective into a form more convenient for comparison:\begin \min_D L_D(D, \mu_G) = -\mathbb_ ln D(x)- \mathbb_ ln (1-D(x))\ \min_G L_G(D, \mu_G) = -\mathbb_ ln (1-D(x))\end Original GAN, non-saturating loss: This objective for generator was recommended in the original paper for faster convergence.L_G = \mathbb_ ln D(x)/math>The effect of using this objective is analyzed in Section 2.2.2 of. Original GAN, maximum likelihood: L_G = \mathbb_ \exp \circ \sigma^\circ D) (x)/math>where \sigma is the logistic function. When the discriminator is optimal, the generator gradient is the same as in
maximum likelihood estimation In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statis ...
, even though GAN cannot perform maximum likelihood estimation ''itself''.
Hinge loss In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended output and a classifier score , th ...
GAN: L_ = -\mathbb_\left min\left(0, -1 + D\left(x\right)\right)\right-\mathbb_\left min\left(0, -1 - D\left(x\right)\right)\right L_ = -\mathbb_ \left(x\right)Least squares GAN:L_D = \mathbb_ D(x)-b)^2+ \mathbb_ D(x)-a)^2/math>L_G = \mathbb_ D(x)-c)^2/math>where a, b, c are parameters to be chosen. The authors recommended a = -1, b = 1, c = 0.


Wasserstein GAN (WGAN)

The Wasserstein GAN modifies the GAN game at two points: * The discriminator's strategy set is the set of measurable functions of type D: \Omega \to \R with bounded Lipschitz norm: \, D\, _L \leq K , where K is a fixed positive constant. * The objective isL_(\mu_G, D) := \mathbb_
(x) An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings, ...
-\mathbb E_
(x) An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings, ...
/math> One of its purposes is to solve the problem of mode collapse (see above). The authors claim "In no experiment did we see evidence of mode collapse for the WGAN algorithm".


GANs with more than 2 players


Adversarial autoencoder

An adversarial autoencoder (AAE) is more autoencoder than GAN. The idea is to start with a plain
autoencoder An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data ( unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder lea ...
, but train a discriminator to discriminate the latent vectors from a reference distribution (often the normal distribution).


InfoGAN

In conditional GAN, the generator receives both a noise vector z and a label c, and produces an image G(z, c). The discriminator receives image-label pairs (x, c), and computes D(x, c). When the training dataset is unlabeled, conditional GAN does not work directly. The idea of InfoGAN is to decree that every latent vector in the latent space can be decomposed as (z, c): an incompressible noise part z, and an informative label part c, and encourage the generator to comply with the decree, by encouraging it to maximize I(c, G(z, c)), the
mutual information In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the " amount of information" (in units such ...
between c and G(z, c), while making no demands on the mutual information z between G(z, c). Unfortunately, I(c, G(z, c)) is intractable in general, The key idea of InfoGAN is Variational Mutual Information Maximization: indirectly maximize it by maximizing a lower bound (G,Q)=\mathbb _ G(z,c)) \quad I(c, G(z, c)) \geq \sup_Q \hat I(G, Q)where Q ranges over all
Markov kernel In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite ...
s of type Q: \Omega_Y \to \mathcal P(\Omega_C). The InfoGAN game is defined as follows:
Three probability spaces define an InfoGAN game: * (\Omega_X, \mu_), the space of reference images. * (\Omega_Z, \mu_Z), the fixed random noise generator. * (\Omega_C, \mu_C), the fixed random information generator. There are 3 players in 2 teams: generator, Q, and discriminator. The generator and Q are on one team, and the discriminator on the other team. The objective function isL(G, Q, D) = L_(G, D) - \lambda \hat I(G, Q)where L_(G, D) = \mathbb_ ln D(x)+ \mathbb_ ln (1-D(G(z, c)))/math> is the original GAN game objective, and \hat I(G, Q) = \mathbb E_ G(z, c))/math> Generator-Q team aims to minimize the objective, and discriminator aims to maximize it:\min_ \max_D L(G, Q, D)


Bidirectional GAN (BiGAN)

The standard GAN generator is a function of type G: \Omega_Z\to \Omega_X, that is, it is a mapping from a latent space \Omega_Z to the image space \Omega_X. This can be understood as a "decoding" process, whereby every latent vector z\in \Omega_Z is a code for an image x\in \Omega_X, and the generator performs the decoding. This naturally leads to the idea of training another network that performs "encoding", creating an
autoencoder An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data ( unsupervised learning). The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder lea ...
out of the encoder-generator pair. Already in the original paper, the authors noted that "Learned approximate inference can be performed by training an auxiliary network to predict z given x". The bidirectional GAN architecture performs exactly this. The BiGAN is defined as follows:
Two probability spaces define a BiGAN game: *(\Omega_X, \mu_), the space of reference images. * (\Omega_Z, \mu_Z), the latent space. There are 3 players in 2 teams: generator, encoder, and discriminator. The generator and encoder are on one team, and the discriminator on the other team. The generator's strategies are functions G:\Omega_Z \to \Omega_X, and the encoder's strategies are functions E:\Omega_X \to \Omega_Z. The discriminator's strategies are functions D:\Omega_X \to
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>. The objective function isL(G, E, D) = \mathbb E_ ln D(x, E(x))+ \mathbb E_ ln (1-D(G(z), z))/math> Generator-encoder team aims to minimize the objective, and discriminator aims to maximize it:\min_ \max_D L(G, E, D)
In the paper, they gave a more abstract definition of the objective as:L(G, E, D) = \mathbb E_ ln D(x, z)+ \mathbb E_ ln (1-D(x, z))/math>where \mu_(dx, dz) = \mu_X(dx) \cdot \delta_(dz) is the probability distribution on \Omega_X\times \Omega_Z obtained by pushing \mu_X forward via x \mapsto (x, E(x)), and \mu_(dx, dz) = \delta_(dx)\cdot \mu_Z(dz) is the probability distribution on \Omega_X\times \Omega_Z obtained by pushing \mu_Z forward via z \mapsto (G(x), z). Applications of bidirectional models include
semi-supervised learning Weak supervision is a branch of machine learning where noisy, limited, or imprecise sources are used to provide supervision signal for labeling large amounts of training data in a supervised learning setting. This approach alleviates the burden of ...
, interpretable machine learning, and
neural machine translation Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. Properties They requi ...
.


CycleGAN

CycleGAN is an architecture for performing translations between two domains, such as between photos of horses and photos of zebras, or photos of night cities and photos of day cities. The CycleGAN game is defined as follows:
There are two probability spaces (\Omega_X, \mu_X), (\Omega_Y, \mu_Y), corresponding to the two domains needed for translations fore-and-back. There are 4 players in 2 teams: generators G_X: \Omega_X \to \Omega_Y, G_Y: \Omega_Y \to \Omega_X, and discriminators D_X: \Omega_X\to
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
D_Y:\Omega_Y\to
, 1 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline (t ...
/math>. The objective function isL(G_X, G_Y, D_X, D_Y) = L_(G_X, D_X) +L_(G_Y, D_Y) + \lambda L_(G_X, G_Y) where \lambda is a positive adjustable parameter, L_ is the GAN game objective, and L_ is the ''cycle consistency loss'':L_(G_X, G_Y) = E_ \, G_X(G_Y(x)) - x\, + E_ \, G_Y(G_X(y)) - y\, The generators aim to minimize the objective, and the discriminators aim to maximize it:\min_ \max_ L(G_X, G_Y, D_X, D_Y)
Unlike previous work like pix2pix, which requires paired training data, cycleGAN requires no paired data. For example, to train a pix2pix model to turn a summer scenery photo to winter scenery photo and back, the dataset must contain pairs of the same place in summer and winter, shot at the same angle; cycleGAN would only need a set of summer scenery photos, and an unrelated set of winter scenery photos.


GANs with particularly large or small scales


BigGAN

The BigGAN is essentially a self-attention GAN trained on a large scale (up to 80 million parameters) to generate large images of ImageNet (up to 512 x 512 resolution), with numerous engineering tricks to make it converge.


Invertible data augmentation

When there is insufficient training data, the reference distribution \mu_ cannot be well-approximated by the empirical distribution given by the training dataset. In such cases, data augmentation can be applied, to allow training GAN on smaller datasets. Naïve data augmentation, however, brings its problems. Consider the original GAN game, slightly reformulated as follows:\begin \min_D L_D(D, \mu_G) = -\mathbb_ ln D(x)- \mathbb_ ln (1-D(x))\ \min_G L_G(D, \mu_G) = -\mathbb_ ln (1-D(x))\endNow we use data augmentation by randomly sampling semantic-preserving transforms T: \Omega \to \Omega and applying them to the dataset, to obtain the reformulated GAN game:\begin \min_D L_D(D, \mu_G) = -\mathbb_ ln D(T(x))- \mathbb_ ln (1-D(x))\ \min_G L_G(D, \mu_G) = -\mathbb_ ln (1-D(x))\endThis is equivalent to a GAN game with a different distribution \mu_', sampled by T(x), with x\sim \mu_, T\sim \mu_. For example, if \mu_ is the distribution of images in ImageNet, and \mu_ samples identity-transform with probability 0.5, and horizontal-reflection with probability 0.5, then \mu_' is the distribution of images in ImageNet and horizontally-reflected ImageNet, combined. The result of such training would be a generator that mimics \mu_'. For example, it would generate images that look like they are randomly cropped, if the data augmentation uses random cropping. The solution is to apply data augmentation to both generated and real images:\begin \min_D L_D(D, \mu_G) = -\mathbb_ ln D(T(x))- \mathbb_ ln (1-D(T(x)))\ \min_G L_G(D, \mu_G) = -\mathbb_ ln (1-D(T(x)))\endThe authors demonstrated high-quality generation using just 100-picture-large datasets. The StyleGAN-2-ADA paper points out a further point on data augmentation: it must be ''invertible''. Continue with the example of generating ImageNet pictures. If the data augmentation is "randomly rotate the picture by 0, 90, 180, 270 degrees with ''equal'' probability", then there is no way for the generator to know which is the true orientation: Consider two generators G, G', such that for any latent z, the generated image G(z) is a 90-degree rotation of G'(z). They would have exactly the same expected loss, and so neither is preferred over the other. The solution is to only use invertible data augmentation: instead of "randomly rotate the picture by 0, 90, 180, 270 degrees with ''equal'' probability", use "randomly rotate the picture by 90, 180, 270 degrees with 0.1 probability, and keep the picture as it is with 0.7 probability". This way, the generator is still rewarded to keep images oriented the same way as un-augmented ImageNet pictures. Abstractly, the effect of randomly sampling transformations T: \Omega \to \Omega from the distribution \mu_ is to define a Markov kernel K_: \Omega \to \mathcal P (\Omega). Then, the data-augmented GAN game pushes the generator to find some \hat \mu_G\in \mathcal P(\Omega), such that K_*\mu_ = K_*\hat\mu_where * is the Markov kernel convolution. A data-augmentation method is defined to be ''invertible'' if its Markov kernel K_ satisfiesK_*\mu= K_*\mu' \implies \mu = \mu' \quad \forall \mu, \mu' \in \mathcal P(\Omega)Immediately by definition, we see that composing multiple invertible data-augmentation methods results in yet another invertible method. Also by definition, if the data-augmentation method is invertible, then using it in a GAN game does not change the optimal strategy \hat \mu_G for the generator, which is still \mu_. There are two prototypical examples of invertible Markov kernels: Discrete case: Invertible stochastic matrices, when \Omega is finite. For example, if \Omega = \ is the set of four images of an arrow, pointing in 4 directions, and the data augmentation is "randomly rotate the picture by 90, 180, 270 degrees with probability p, and keep the picture as it is with probability (1-3p)", then the Markov kernel K_ can be represented as a stochastic matrix: _= \begin (1-3p) & p & p & p \\ p & (1-3p) & p & p \\ p & p & (1-3p) & p \\ p & p & p & (1-3p) \end and K_ is an invertible kernel iff _/math> is an invertible matrix, that is, p \neq 1/4. Continuous case: The gaussian kernel, when \Omega = \R^n for some n \geq 1. For example, if \Omega = \R^ is the space of 256x256 images, and the data-augmentation method is "generate a gaussian noise z\sim \mathcal N(0, I_), then add \epsilon z to the image", then K_ is just convolution by the density function of \mathcal N(0, \epsilon^2 I_). This is invertible, because convolution by a gaussian is just convolution by the
heat kernel In the mathematical study of heat conduction and diffusion, a heat kernel is the fundamental solution to the heat equation on a specified domain with appropriate boundary conditions. It is also one of the main tools in the study of the spectru ...
, so given any \mu\in\mathcal P(\R^n), the convolved distribution K_ * \mu can be obtained by heating up \R^n precisely according to \mu, then wait for time \epsilon^2/4. With that, we can recover \mu by running the
heat equation In mathematics and physics, the heat equation is a certain partial differential equation. Solutions of the heat equation are sometimes known as caloric functions. The theory of the heat equation was first developed by Joseph Fourier in 1822 for t ...
''backwards in time'' for \epsilon^2/4. More examples of invertible data augmentations are found in the paper.


SinGAN

SinGAN pushes data augmentation to the limit, by using only a single image as training data and performing data augmentation on it. The GAN architecture is adapted to this training method by using a multi-scale pipeline. The generator G is decomposed into a pyramid of generators G = G_1 \circ G_2 \circ \cdots \circ G_N, with the lowest one generating the image G_N(z_N) at the lowest resolution, then the generated image is scaled up to r(G_N(z_N)), and fed to the next level to generate an image G_(z_ + r(G_N(z_N))) at a higher resolution, and so on. The discriminator is decomposed into a pyramid as well.


StyleGAN series

The StyleGAN family is a series of architectures pubilshed by
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
's research division.


Progressive GAN

Progressive GAN is a method for training GAN for large-scale image generation stably, by growing a GAN generator from small to large scale in a pyramidal fashion. Like SinGAN, it decomposes the generator asG = G_1 \circ G_2 \circ \cdots \circ G_N, and the discriminator as D = D_1 \circ D_2 \circ \cdots \circ D_N. During training, at first only G_N, D_N are used in a GAN game to generate 4x4 images. Then G_, D_ are added to reach the second stage of GAN game, to generate 8x8 images, and so on, until we reach a GAN game to generate 1024x1024 images. To avoid shock between stages of the GAN game, each new layer is "blended in" (Figure 2 of the paper). For example, this is how the second stage GAN game starts: * Just before, the GAN game consists of the pair G_N, D_N generating and discriminating 4x4 images. * Just after, the GAN game consists of the pair ((1-\alpha) + \alpha\cdot G_)\circ u \circ G_N, D_N \circ d \circ ((1-\alpha) + \alpha\cdot D_) generating and discriminating 8x8 images. Here, the functions u, d are image up- and down-sampling functions, and \alpha is a blend-in factor (much like an
alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
in image composing) that smoothly glides from 0 to 1.


StyleGAN-1

StyleGAN-1 is designed as a combination of Progressive GAN with neural style transfer. The key architectural choice of StyleGAN-1 is a progressive growth mechanism, similar to Progressive GAN. Each generated image starts as a constant 4\times 4 \times 512 array, and repeatedly passed through style blocks. Each style block applies a "style latent vector" via affine transform ("adaptive instance normalization"), similar to how neural style transfer uses
Gramian matrix In linear algebra, the Gram matrix (or Gramian matrix, Gramian) of a set of vectors v_1,\dots, v_n in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product G_ = \left\langle v_i, v_j \right\r ...
. It then adds noise, and normalize (subtract the mean, then divide by the variance). At training time, usually only one style latent vector is used per image generated, but sometimes two ("mixing regularization") in order to encourage each style block to independently perform its stylization without expecting help from other style blocks (since they might receive an entirely different style latent vector). After training, multiple style latent vectors can be fed into each style block. Those fed to the lower layers control the large-scale styles, and those fed to the higher layers control the fine-detail styles. Style-mixing between two images x, x' can be performed as well. First, run a gradient descent to find z, z' such that G(z)\approx x, G(z')\approx x'. This is called "projecting an image back to style latent space". Then, z can be fed to the lower style blocks, and z' to the higher style blocks, to generate a composite image that has the large-scale style of x, and the fine-detail style of x'. Multiple images can also be composed this way.


StyleGAN-2

StyleGAN-2 improves upon StyleGAN-1, by using the style latent vector to transform the convolution layer's weights instead, thus solving the "blob" problem. This was updated by the StyleGAN-2-ADA ("ADA" stands for "adaptive"), which uses invertible data augmentation as described above. It also tunes the amount of data augmentation applied by starting at zero, and gradually increasing it until an "overfitting heuristic" reaches a target level, thus the name "adaptive".


StyleGAN-3

StyleGAN-3 improves upon StyleGAN-2 by solving the "texture sticking" problem, which can be seen in the official videos. They analyzed the problem by the
Nyquist–Shannon sampling theorem The Nyquist–Shannon sampling theorem is a theorem in the field of signal processing which serves as a fundamental bridge between continuous-time signals and discrete-time signals. It establishes a sufficient condition for a sample rate that pe ...
, and argued that the layers in the generator learned to exploit the high-frequency signal in the pixels they operate upon. To solve this, they proposed imposing strict lowpass filters between each generator's layers, so that the generator is forced to operate on the pixels in a way faithful to the continuous signals they represent, rather than operate on them as merely discrete signals. They further imposed rotational and translational invariance by using more signal filters. The resulting StyleGAN-3 is able to solve the texture sticking problem, as well as generating images that rotate and translate smoothly.


Applications

GAN applications have increased rapidly.


Fashion, art and advertising

GANs can be used to generate art; ''
The Verge ''The Verge'' is an American technology news website operated by Vox Media, publishing news, feature stories, guidebooks, product reviews, consumer electronics news, and podcasts. The website launched on November 1, 2011, and uses Vox Media' ...
'' wrote in March 2019 that "The images created by GANs have become the defining look of contemporary AI art." GANs can also be used to inpaint photographs or create photos of imaginary fashion models, with no need to hire a model, photographer or makeup artist, or pay for a studio and transportation. GANs have also been used for virtual shadow generation.


Interactive Media

In 2020, Artbreeder was used to create the main antagonist in the sequel to the psychological web horror series ''
Ben Drowned ''Ben Drowned'' (originally published as ''Haunted Majora's Mask Cartridge'') is a three-part multimedia alternate reality game (ARG) web serial and web series created by Alexander D. "Jadusable" Hall. Originating as a creepypasta based on the 2 ...
''. The author would later go on to praise GAN applications for their ability to help generate assets for independent artists who are short on budget and manpower.


Science

GANs can improve astronomical images and simulate
gravitational lens A gravitational lens is a distribution of matter (such as a cluster of galaxies) between a distant light source and an observer that is capable of bending the light from the source as the light travels toward the observer. This effect is known ...
ing for dark matter research. They were used in 2019 to successfully model the distribution of
dark matter Dark matter is a hypothetical form of matter thought to account for approximately 85% of the matter in the universe. Dark matter is called "dark" because it does not appear to interact with the electromagnetic field, which means it does not ab ...
in a particular direction in space and to predict the gravitational lensing that will occur. GANs have been proposed as a fast and accurate way of modeling high energy jet formation and modeling
showers A shower is a place in which a person bathes under a spray of typically warm or hot water. Indoors, there is a drain in the floor. Most showers have temperature, spray pressure and adjustable showerhead nozzle. The simplest showers have a ...
through
calorimeters A calorimeter is an object used for calorimetry, or the process of measuring the heat of chemical reactions or physical changes as well as heat capacity. Differential scanning calorimeters, isothermal micro calorimeters, titration calorimeter ...
of
high-energy physics Particle physics or high energy physics is the study of fundamental particles and forces that constitute matter and radiation. The fundamental particles in the universe are classified in the Standard Model as fermions (matter particles) and b ...
experiments. GANs have also been trained to accurately approximate bottlenecks in computationally expensive simulations of particle physics experiments. Applications in the context of present and proposed
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
experiments have demonstrated the potential of these methods for accelerating simulation and/or improving simulation fidelity.


Video games

In 2018, GANs reached the
video game modding Video game modding (short for "modification") is the process of alteration by players or fans of one or more aspects of a video game, such as how it looks or behaves, and is a sub-discipline of general modding. Mods may range from small changes an ...
community, as a method of up-scaling low-resolution 2D textures in old video games by recreating them in 4k or higher resolutions via image training, and then down-sampling them to fit the game's native resolution (with results resembling the
supersampling Supersampling or supersampling anti-aliasing (SSAA) is a spatial anti-aliasing method, i.e. a method used to remove aliasing (jagged and pixelated edges, colloquially known as "jaggies") from images rendered in computer games or other computer p ...
method of
anti-aliasing Anti-aliasing may refer to any of a number of techniques to combat the problems of aliasing in a sampled signal such as a digital image or digital audio recording. Specific topics in anti-aliasing include: * Anti-aliasing filter, a filter used be ...
). With proper training, GANs provide a clearer and sharper 2D texture image magnitudes higher in quality than the original, while fully retaining the original's level of details, colors, etc. Known examples of extensive GAN usage include ''
Final Fantasy VIII is a role-playing video game developed and published by Square for the PlayStation console. Released in 1999, it is the eighth main installment in the ''Final Fantasy'' series. Set on an unnamed fantasy world with science fiction elements, t ...
'', ''
Final Fantasy IX is a 2000 role-playing video game developed and published by Square (video game company), Square for the PlayStation (console), PlayStation video game console. It is the ninth game in the main ''Final Fantasy'' series. The plot focuses on a wa ...
'', '' Resident Evil REmake'' HD Remaster, and '' Max Payne''.


AI generated video

Artificial intelligence art Artificial intelligence art is any artwork created through the use of artificial intelligence. Tools and processes Imagery There are many mechanisms for creating AI art, including procedural 'rule-based' generation of images using mathemati ...
for video uses AI to generate video from text as Text-to-Video model


Audio synthesis


Concerns about malicious applications

Another example of a GAN generated portrait Concerns have been raised about the potential use of GAN-based
human image synthesis Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery ha ...
for sinister purposes, e.g., to produce fake, possibly incriminating, photographs and videos. GANs can be used to generate unique, realistic profile photos of people who do not exist, in order to automate creation of fake social media profiles. In 2019 the state of California considered and passed on October 3, 2019, th
bill AB-602
which bans the use of human image synthesis technologies to make fake pornography without the consent of the people depicted, an
bill AB-730
which prohibits distribution of manipulated videos of a political candidate within 60 days of an election. Both bills were authored by Assembly member
Marc Berman Marc Berman (born October 31, 1980) is a politician and attorney, currently serving as a member of the California State Assembly. He is a Democrat representing the 24th Assembly District, encompassing parts of the San Francisco Peninsula and S ...
and signed by Governor
Gavin Newsom Gavin Christopher Newsom (born October 10, 1967) is an American politician and businessman who has been the 40th governor of California since 2019. A member of the Democratic Party, he served as the 49th lieutenant governor of California fr ...
. The laws went into effect in 2020. DARPA's Media Forensics program studies ways to counteract fake media, including fake media produced using GANs.


Transfer learning

State-of-art
transfer learning Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize ...
research use GANs to enforce the alignment of the latent feature space, such as in deep reinforcement learning. This works by feeding the embeddings of the source and target task to the discriminator which tries to guess the context. The resulting loss is then (inversely) backpropagated through the encoder.


Miscellaneous applications

GAN can be used to detect glaucomatous images helping the early diagnosis which is essential to avoid partial or total loss of vision. GANs that produce
photorealistic Photorealism is a genre of art that encompasses painting, drawing and other graphic media, in which an artist studies a photograph and then attempts to reproduce the image as realistically as possible in another medium. Although the term can be ...
images can be used to visualize
interior design Interior design is the art and science of enhancing the interior of a building to achieve a healthier and more aesthetically pleasing environment for the people using the space. An interior designer is someone who plans, researches, coordina ...
,
industrial design Industrial design is a process of design applied to physical Product (business), products that are to be manufactured by mass production. It is the creative act of determining and defining a product's form and features, which takes place in advan ...
, shoes, bags, and
clothing Clothing (also known as clothes, apparel, and attire) are items worn on the body. Typically, clothing is made of fabrics or textiles, but over time it has included garments made from animal skin and other thin sheets of materials and natural ...
items or items for computer games' scenes. Such networks were reported to be used by
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
. GANs have been used to create
forensic facial reconstruction Forensic facial reconstruction (or forensic facial approximation) is the process of recreating the face of an individual (whose identity is often not known) from their skeletal remains through an amalgamation of artistry, anthropology, osteol ...
s of deceased historical figures. GANs can reconstruct 3D models of objects from images, generate novel objects as 3D point clouds, and model patterns of motion in video. GANs can be used to age face photographs to show how an individual's appearance might change with age. GANs can be used for data augmentation, eg. to improve DNN classifier GANs can also be used to inpaint missing features in maps, transfer map styles in cartography or augment street view imagery. Relevance feedback on GANs can be used to generate images and replace image search systems. A variation of the GANs is used in training a network to generate optimal control inputs to nonlinear
dynamical system In mathematics, a dynamical system is a system in which a Function (mathematics), function describes the time dependence of a Point (geometry), point in an ambient space. Examples include the mathematical models that describe the swinging of a ...
s. Where the discriminatory network is known as a critic that checks the optimality of the solution and the generative network is known as an Adaptive network that generates the optimal control. The critic and adaptive network train each other to approximate a nonlinear optimal control. GANs have been used to visualize the effect that climate change will have on specific houses. A GAN model called Speech2Face can reconstruct an image of a person's face after listening to their voice. In 2016 GANs were used to generate new molecules for a variety of protein targets implicated in cancer, inflammation, and fibrosis. In 2019 GAN-generated molecules were validated experimentally all the way into mice. Whereas the majority of GAN applications are in image processing, the work has also been done with time-series data. For example, recurrent GANs (R-GANs) have been used to generate energy data for machine learning.


History

The most direct inspiration for GANs was noise-contrastive estimation, which uses the same loss function as GANs and which Goodfellow studied during his PhD in 2010–2014. Other people had similar ideas but did not develop them similarly. An idea involving adversarial networks was published in a 2010 blog post by Olli Niemitalo. This idea was never implemented and did not involve
stochasticity Stochastic (, ) refers to the property of being well described by a random probability distribution. Although stochasticity and randomness are distinct in that the former refers to a modeling approach and the latter refers to phenomena themselve ...
in the generator and thus was not a generative model. It is now known as a conditional GAN or cGAN. An idea similar to GANs was used to model animal behavior by Li, Gauci and Gross in 2013.
Adversarial machine learning Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. A recent survey exposes the fact that practitioners report a dire need for better protecting machine learning syste ...
has other uses besides generative modeling and can be applied to models other than neural networks. In control theory, adversarial learning based on neural networks was used in 2006 to train robust controllers in a game theoretic sense, by alternating the iterations between a minimizer policy, the controller, and a maximizer policy, the disturbance. In 2017, a GAN was used for image enhancement focusing on realistic textures rather than pixel-accuracy, producing a higher image quality at high magnification. In 2017, the first faces were generated. These were exhibited in February 2018 at the Grand Palais. Faces generated by StyleGAN in 2019 drew comparisons with
Deepfake Deepfakes (a portmanteau of "deep learning" and "fake") are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. While the act of creating fake content is not new, deepfakes leverage powerful ...
s. Beginning in 2017, GAN technology began to make its presence felt in the fine arts arena with the appearance of a newly developed implementation which was said to have crossed the threshold of being able to generate unique and appealing abstract paintings, and thus dubbed a "CAN", for "creative adversarial network". A GAN system was used to create the 2018 painting ''
Edmond de Belamy ''Edmond de Belamy'' is a generative adversarial network portrait painting constructed in 2018 by Paris-based arts-collective ''Obvious.'' Printed on canvas, the work belongs to a series of generative images called La Famille de Belamy. The name B ...
,'' which sold for US$432,500. An early 2019 article by members of the original CAN team discussed further progress with that system, and gave consideration as well to the overall prospects for an AI-enabled art. In May 2019, researchers at Samsung demonstrated a GAN-based system that produces videos of a person speaking, given only a single photo of that person. In August 2019, a large dataset consisting of 12,197 MIDI songs each with paired lyrics and melody alignment was created for neural melody generation from lyrics using conditional GAN-LSTM (refer to sources at GitHu
AI Melody Generation from Lyrics
. In May 2020,
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
researchers taught an AI system (termed "GameGAN") to recreate the game of ''
Pac-Man originally called ''Puck Man'' in Japan, is a 1980 maze action video game developed and released by Namco for arcades. In North America, the game was released by Midway Manufacturing as part of its licensing agreement with Namco America. Th ...
'' simply by watching it being played.


References


External links

* *
This Person Does Not Exist
photorealistic images of people who do not exist, generated by StyleGAN
This Cat Does Not Exist
photorealistic images of cats who do not exist, generated by StyleGAN * {{Differentiable computing Neural network architectures Cognitive science Unsupervised learning